Figure 1 

The novel gene as identified through RACE analysis (894 bp) 

GGGAGTGGAGTGAGGGGTAACAAGATGGCGACCGAGACGGTGGAGCTCCATAAGCTAA 

AGCTTGCCGAACTAAAGCAAGAATGTCTTGCTCGTGGTTTGGAGACCAAGGGAATAAAG 

CAAGATCTTATCCACAGACTCCAGGCATATCTTGAAGAACATGCTGAAGAGGAGGCAAAT 

GAAGAAGATGTACTGGGAGATGAAACAGAGGAAGAAGAAACAAAGCCCATTGAGCTCCC 

TGTCAAAGAGGAAGAACCCCCTGAAAAAACTGTTGATGTGGCAGCAGAGAAGAAAGTGG 

TGAAAATTACATCTGAAATACCACAGACTGAGAGAATGCAGAAGAGGGCTGAACGATTCA 

ATGTACCTGTGAGCTTGGAGAGTAAGAAAGCTGCTCGGGCAGCTAGGTTTGGGATTTCT 

TCAGTTCCAACAAAAGGTCTGTCATCTGATAACAAACCTATGGTTAACTTGGATAAGCTG 

AAGGAAAGAGCTCAAAGATTTGGTTTGAATGTCTCTTCAATCTCCAGAAAGTCTGAAGAT 

GATGAGAAACTGAAAAAGAGGAAGGAGCGATTTGGGATTGTCACAAGTTCAGCTGGAAC 

TGGAACCACAGAGGATACAGAGGCAAAGAAGAGGAAAAGAGCAGAGCGCTTTG 

GCCTGATGAAAAGTTCCTGATACTTTCTGTTCTCCAGTC 

TTGGTCACATATATGCCTAAATGCACAGTCATGTGCCTACGTCCTGCCTCGCAATGAGG 
GAGCATGTACCCCAGGTACATCCATGAACTGCGGCAGCAGTTTGACTTATTGCTGTTTCA 
GCTTTAAGGTTGTTGTG I I I I I GTTTTTG ATTATGTTG CTTG TT AAT AAAAAAAAAT AG AAA 



Figure 2 

Amino acid sequence as translated from the novel gene (210 amino acids) 

MATETVELHKLKLAELKQECLARGLETKGIKQDLIHRLQAYLEEHAEEEANEEDVLGDETEEE 
ETKPIELPVKEEEPPEKTVDVAAEKKWKITSEIPQTERMQKRAER FNVPVSLESK KAARAAR 
FGISSVPTK GLSSDNKPMVNLDKLKERAQR FGLNVSSISRK SEDDEKLKKRKER FGJVTSSAG 
TGTTEDTEAKKRKRAERFG I A 



Underlined sequences are amino acid sequences obtained by MS/MS analysis. 



Figure 3 



The sequence of the novel gene amplified through long distant PCR and used to 
construct the expression vector (873 bp), 

TGGAGTGAGGGGTAACAAGATGGCGACCGAGACGGTGGAGCTCCATAAGCTAAAGCTT 

GCCGAACTAAAGCAAGAATGTCTTGCTCGTGGTTTGGAGACCAAGGGAATAAAGCAAGA 

TCTTATCCACAGACTCCAGGCATATCTTGAAGAACATGCTGAAGAGGAGGCAAATGAAG 

AAGATGTACTGGGAGATGAAACAGAGGAAGAAGAAACAAAGCCCATTGAGCTCCCTGTC 

AAAGAGGAAGAACCCCCTGAAAAAACTGTTGATGTGGCAGCAGAGAAGAAAGTGGTGAA 

AATTACATCTGAAATACCACAGACTGAGAGAATGCAGAAGAGGGCTGAACGATTCAATGT 

ACCTGTGAGCTTGGAGAGTAAGAAAGCTGCTCGGGCAGCTAGGTTTGGGATTTCTTCAG 

TTCCAACAAAAGGTCTGTCATCTGATAACAAACCTATGGTTAACTTGGATAAGCTGAAGG 

AAAGAGCTCAAAGATTTGGTTTGAATGTCTCTTCAATCTCCAGAAAGTCTGAAGATGATG 

AGAAACTGAAAAAGAGGAAGGAGCGATTTGGGATTGTCACAAGTTCAGCTGGAACTGGA 

ACCACAGAGGATACAGAGGCAAAGAAGAGGAAAAGAGGAGAGCGCTTTGGGATTGCCT 

GATGAAAAGTTCCTGATACTTTCTGTTCTCCAGTGTTTTCCATTTCTCTCCTTCTTCTTGG 

TCACATATATGCCTAAATGCACAGTCATGTGCCTACGTCCTGCCTCGCAATGAGGGAGC 

ATGTACCCCAGGTACATCCATGAACTGCGGCAGCAGTTTGACTTATTGCTGTTTCAGCTT 

TAAGGTTGTTGTG I I I I I G ! I M I G ATTATGTTGCTTGTTAAT 



Figure 4 
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22 cycles of PGR 



26 cycles of PCR 



Figure 5 



P-151 S'-Untranslated Region 



1 75 

CAGGGGCAGCAGT GATT AT CT GAACT C G GAT CTTTAAAATT GT G GT AG CT CT AAAGCT GAT GAT GT CT GGTTAGG 

******************* ********************* 

76 150 
AAGTGGCTCTTGCCCGCCCCAGCCCCACCGCCAGTTCCTTAAGCCCGCCCCATGCCCCTCCCAGCTTCCTCCTCA 
*************************************************************************** 

151 225 
TGTTCATCGGTTTTTTCAGGGCTCCCTTCAACGCTCCCCTCTCAGTATTTAGGTCACCACTCCCTCGGCGCCCCT 
*************************************************************************** 

226 300 
TTCGCCTCCCACCATTTTTCCTCAGCAACCCTTACAGTCTTTGCAGCTCCTACCTGCCAGCTCAGATCCCCGTCC 
******************************************************** 

301 375 
GGCT ATGGGCGCGGCGCCGGCTACCACACCTGAAGTCTCCAGGAAGTAA CGCCTCTCCTTCTGCCCCTTTCCTGT 

376 450 
TGGAGGAACAGAATCAGCGCTGCCACCACCCATTGGTTGGTGGTCTGT AATGCAGAAGCACAGTTGGTTGCCATT 

451 525 
TCTGTCGTTCGCAAGATACAGTGCCCGCCCCTCTCCCAGTTCCACCTTTTGA AAGAGGTGGGGCTNAGCTGCCTAG 

526 600 
AGAAGTGAGAGCGAC GT CAGCT AT TGAC CA ATGGGAAGAGCT GATGGT ATGGC GT GGGAGCAAGAGTGA CAAC GA 

601 675 
TTGGTCAGCCTTGCATCTCTACGCCTAAGGCGGGAACTCCTGGAGGCGGAGGCCGCGGGTGGGGGGAGTGGAGTG 

676 

AGGGGTAACAAGATG P151 coding region 



(Total length: 690 bp) 



Sequence with asterisk: the 274 bp fragment 



Underlined sequences are the minicistrones or uORFs before the start of the PI 51 coding 
region with the start and stop codons in bold. 



